Static Analysis Meets Distributed Fault-Tolerance: Enabling State-Machine Replication with Nondeterminism
نویسندگان
چکیده
Midas is an inter-disciplinary approach to supporting state-machine replication for nondeterministic distributed applications. The approach exploits compile-time static analysis to identify both first-hand and second-hand sources of nondeterminism. Subsequent runtime compensation occurs through either the transfer of nondeterministic checkpoints or the reexecution of inserted code, and restores consistency among replicas before each new client request. The approach avoids the need for lock-step synchronization and leverages application-level insight to address only the nondeterminism that matters. Our preliminary evaluation demonstrates Midas’ feasibility and current performance overheads.
منابع مشابه
Living with Nondeterminism in Replicated Middleware Applications
Application-level nondeterminism can lead to inconsistent state that defeats the purpose of replication as a fault-tolerance strategy. We present Midas, a new approach for living with nondeterminism in distributed, replicated, middleware applications. Midas exploits (i) the static program analysis of the application’s source code prior to replica deployment and (ii) the online compensation of r...
متن کاملByzantine Fault Tolerant Execution of Long-running Distributed Applications
Long-running distributed applications that automate critical decision processes require Byzantine fault tolerance to ensure progress in spite of arbitrary failures. Existing replication protocols for data servers guarantee that externally requested operations execute correctly even if a bounded number of replicas fail arbitrarily. However, since these protocols only support passive state machin...
متن کاملThe State Machine Approach: A Tutorial
The state machine approach is a general method for achieving fault tolerance and implementing decentralized control in distributed systems. This paper reviews the approach and identifies abstractions needed for coordinating ensembles of state machines. Implementations of these abstractions for two different failure models Byzantine and fail-stolr--are discussed. The state machine approach is il...
متن کاملA Guided Tour on the Theory and Practice of State Machine Replication
This chapter presents the fundamentals and applications of the State Machine Replication (SMR) technique for implementing consistent fault-tolerant services. Our focus here is threefold. First we present some fundamentals about distributed computing and three “practical” SMR protocols for different fault models. Second, we discuss some recent work aiming to improve the performance, modularity a...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کامل